Issues in Anaphora Annotation: The Case of Encyclopaedic Animal Descriptions

نویسندگان

  • Josef Meyer
  • Robert Dale
چکیده

Anaphora resolution has been a widely studied problem in natural language processing almost since work in the field began (see [Hirst 1981, Mitkov 1999] for summaries). However, most work focuses on the resolution of pronouns, or, at best, on a wider range of anaphoric expressions which co-refer with their antecedents. Real natural language texts exhibit a much richer range of anaphoric phenomena. In this paper, we describe an experiment in annotating a corpus for the variety of anaphors found, and indicate some of the problematic cases that robust natural language processing techniques have to deal with. As a result of this analysis we have identified several subtypes of associative anaphoric relationships that occur frequently in the corpus, and determined that it may be possible to identify the most common relationships using some fairly simple heuristics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Spanish Zero Pronoun Distribution

The aim of this paper is to report the distribution of Spanish zero pronouns in three different genres: legal, encyclopaedic and instructional. The Z-corpora were created for this purpose and a sample of 1043 zero pronouns was annotated. The most salient patterns of distribution are compared for each genre, and some relevant issues concerning the use of zero pronouns are described in relation t...

متن کامل

Annotating Event Anaphora: A Case Study

In recent years we have resgitered a renewed interest in event detection and temporal processing of text/discourse. TimeML (Pustejovsky et al., 2003a) has shed new lights on the notion of event and developed a new methodology for its annotation. On a parallel, works on anaphora resolution have developed a reliable methodology for the annotation and pointed out the core role of this phenomenon f...

متن کامل

Multidimensional markup and heterogeneous linguistic resources

The paper discusses two topics: firstly an approach of using multiple layers of annotation is sketched out. Regarding the XML representation this approach is similar to standoff annotation. A second topic is the use of heterogeneous linguistic resources (e.g., XML annotated documents, taggers, lexical nets) as a source for semiautomatic multi-dimensional markup to resolve typical linguistic iss...

متن کامل

Anaphora Annotation in Hindi Dependency TreeBank

In this paper, we propose a scheme for anaphora annotation in Hindi Dependency Treebank. The goal is to identify and handle the challenges that arise in the annotation of reference relations in Hindi. We identify some of the issues related to anaphora annotation specific to Hindi such as distribution of markable span, sequential annotation, representation format, annotation of multiple referent...

متن کامل

ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions

This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001